All Questions
5 questions
1vote
3answers
1kviews
Comparing whether two very large text contents are different or not efficiently
I have a MySQL database with a column Body MEDIUMTEXT. Until now I used to only store the contents into it. There was no update option for the users of the application. Now, I wanted to add an update ...
0votes
1answer
117views
I have a data set of over a million addresses and I want to display the closest N locations to a given address or current location
I am a student working on a personal project which is essentially a location finder that will be on a website. I have a data set of over a million addresses and I want to display the closest N ...
7votes
3answers
2kviews
System Design: Very Large CSV Imports Every Month
We have a webapp that will rely on large CSVs from external vendors every month. When I say large, we are looking at around 6gb so a few million rows. Probably, 2-5 CSVs. This webapp will also allow ...
6votes
3answers
2kviews
Deduplication of complex records / Similarity Detection
I'm working on a project that involves records with fairly large numbers of fields (~15-20) and I'm trying to figure out a good way to implement deduplication. Essentially the records are people along ...
3votes
1answer
311views
Processing every leaf under a node in a tree efficiently
Short version: In a tree (non-binary) with many levels of children, where each node can have multiple leaves, what is the best way to tally leaves that meet a certain condition given a node? Long, ...